Effective Quality Assurance for Data Labels through Crowdsourcing and Domain Expert Collaboration
نویسندگان
چکیده
Researchers and scientists have been using crowdsourcing platforms to collect labeled training data in recent years. The process is cost-effective and scalable, but research has shown that the quality of truth inference is unstable due to worker bias, work variance, and task difficulty. In this demonstration, we present a hybrid system, named IDLE (Integrated Data Labeling Engine), that brings together a well-trained troop of domain experts and themultitudes of a crowdsourcing platform to collect high-quality training data for industry-level classification engines. We show how to acquire high quality labeled data through quality control strategies that dynamically and cost-effectively leverage the strengths of both domain experts and crowdsourcing.
منابع مشابه
Ontology Quality Assurance with the Crowd
The Semantic Web has the potential to change the Web as we know it. However, the community faces a significant challenge in managing, aggregating, and curating the massive amount of data and knowledge. Human computation is only beginning to serve an essential role in the curation of these Web-based data. Ontologies, which facilitate data integration and search, serve as a central component of t...
متن کاملEffectively Crowdsourcing Radiology Report Annotations
Crowdsourcing platforms are a popular choice for researchers to gather text annotations quickly at scale. We investigate whether crowdsourced annotations are useful when the labeling task requires medical domain knowledge. Comparing a sentence classification model trained with expert-annotated sentences to the same model trained on crowd-labeled sentences, we find the crowdsourced training data...
متن کاملLeveraging non-expert crowdsourcing workers for improper task detection in crowdsourcing marketplaces
Controlling the quality of tasks, i.e., propriety of posted jobs, is a major challenge in crowdsourcing marketplaces. Most existing crowdsourcing services prohibit requesters from posting illegal or objectionable tasks. Operators in marketplaces have to monitor tasks continuously to find such improper ones; however, it is very expensive to manually investigate each task. In this paper, we prese...
متن کاملExpert Discovery: A web mining approach
Expert discovery is a quest in search of finding an answer to a question: “Who is the best expert of a specific subject in a particular domain within peculiar array of parameters?” Expert with domain knowledge in any field is crucial for consulting in industry, academia and scientific community. Aim of this study is to address the issues for expert-finding task in real-world community. Collabor...
متن کاملFinding Patterns in Noisy Crowds: Regression-based Annotation Aggregation for Crowdsourced Data
Crowdsourcing offers a convenient means of obtaining labeled data quickly and inexpensively. However, crowdsourced labels are often noisier than expert-annotated data, making it difficult to aggregate them meaningfully. We present an aggregation approach that learns a regression model from crowdsourced annotations to predict aggregated labels for instances that have no expert adjudications. The...
متن کامل